An Online Nanoinformatics Platform Empowering Computational Modeling of Nanomaterials by Nanostructure Annotations and Machine Learning Toolkits

Modern nanotechnology has generated numerous datasets from in vitro and in vivo studies on nanomaterials, with some available on nanoinformatics portals. However, these existing databases lack the digital data and tools suitable for machine learning studies. Here, we report a nanoinformatics platform that accurately annotates nanostructures into machine-readable data files and provides modeling toolkits. This platform, accessible to the public at https://vinas-toolbox.com/, has annotated nanostructures of 14 material types. The associated nanodescriptor data and assay test results are appropriate for modeling purposes. The modeling toolkits enable data standardization, data visualization, and machine learning model development to predict properties and bioactivities of new nanomaterials. Moreover, a library of virtual nanostructures with their predicted properties and bioactivities is available, directing the synthesis of new nanomaterials. This platform provides a data-driven computational modeling platform for the nanoscience community, significantly aiding in the development of safe and effective nanomaterials.

Endpoint Profile modules.ViNAS-Pro also provides services for data deposit, nanostructure construction, and nanodescriptor calculation through the Service component.

The services of data deposit and calculation on ViNAS-Pro
To facilitate data sharing within the nanoscience community, ViNAS-Pro provides a Data Deposit interface for users to deposit data into ViNAS-Pro databases (Figure S10A).Depositors can deposit nanostructure data in PDB format, as well as nanodescriptor data and assay data in CSV/XLSX format.After the in-house data cleaning and validation, the uploaded data will be integrated into the ViNAS-Pro databases.ViNAS-Pro provides a nanostructure construction and nanodescriptor calculation service for new NMs (Figure S10B).Users can request the service through the Calculation Service interface by providing basic information about the NM.When requesting nanodescriptor calculation, users are encouraged to provide corresponding nanostructures in PDB format.Afterwards, users will receive a calculation result by email, which can be used for modeling and other nanoinformatics tasks.

A case study using ViNAS-Pro for nanomaterial design
A case study that described the design of a new gold nanoparticle is shown in Figure S11.Firstly, the user needs to set up the structural parameters including material type, shape, size, core, ligand SMILES, and ligand density, for constructing a desired GNP (Figure S11A).
Then this information will be submitted through the Calculation Service interface (https://vinas-toolbox.com/calculation_service). Then the user will receive a PDB file storing the nanostructure information and an excel file for nanodescriptors (Figure S11B).The user can use pre-developed models on the NanoPredictor interface to predict the properties and bioactivities of this new GNP by uploading the calculated nanodescriptors, such as ROS in A549 cells and zeta potential in water from NanoAID-15 (https://vinas-toolbox.com/model_developed_aid15) and 16 (https://vinas-toolbox.com/model_developed_aid16)(Figure S11C).Moreover, the user can also develop and use their own models for predictions as introduced in the Document Tutorial on ViNAS-Pro (https://vinas-toolbox.com/tutorial).In the end, this new GNP can be experimentally synthesized with desired predicted properties/activities (Figure S11D).The related data for this case study are also available in the supporting information.

Online platform implementation
The backend of ViNAS-Pro relies on the Python-based Flask framework to manage server-side logic, handling user requests and generating operations based on requests.On the frontend, HTML and JavaScript were employed to create the user interface.A variety of opensource libraries, such as SQLite, 3Dmol, DataTables, and Plotly, were used for data visualization and manipulation during data analysis and modeling process.These open-source libraries were added to support necessary functions of ViNAS-Pro and can be updated easily when there are newer libraries available in the future.

Experimental data collection and curation
The ViNAS-Pro database was compiled with 328 unique NM records from in house studies and 422 unique NM records from external data.The external data were manually collected from literatures and most of these data have been used in our previous modeling studies [1][2][3] .To ensure the quality, the data have been incorporated into the database under the following conditions: (1) basic information about NMs, such as core shape and size, was provided in the original sources; (2) surface chemistry information was included, and the surface ligand structure can be annotated in Simplified Molecular Input Line Entry System (SMILES) format; (3) property/bioactivity/toxicity data were available for each NM record.The details of data curation process were described in our previous studies 1,2 .

Nanostructure annotation and nanodescriptor generation
For the structure annotation of nanoparticles, the core atoms were initially assembled into a nano core based on the particle size and shape information.Subsequently, the associated surface ligands/atoms were randomly distributed on the core surface 1,2 .All annotated nanostructures on ViNAS-Pro were saved in PDB format.Due to their large sizes, we employed a size scaling-down technique to construct microplastic structures, which facilitated the annotation and storage of their structural information in PDB files 2 .For example, we reduced the sizes of four microplastics (MP001 to MP004) in the dataset (NanoAID-27) by a factor of 70 to improve the efficiency of structural constructions.On the corresponding NM record page and assay page, we provided the relevant information for users as well.In our previous studies, we developed novel geometrical nanodescriptors by employing Delaunay tessellation and atomic properties 2, 4 .Every four nearest atoms that can form a tetrahedron were identified as nanodescriptor from nanostructures.These nanodescriptors can quantify nanostructures by simulating NMs' surface chemistry for modeling purposes.Nanodescriptors for each NM records on ViNAS-Pro were calculated using in-house scripts (coded in C++/Java 1.8.0_301) and saved in XLSX format.

Machine learning toolkits construction
Descriptor and Model toolkits were developed by Python and various Python libraries such as scikit-learn.The Descriptor toolkit incorporated the PCA method to transform high-dimensional data into lower-dimensional representations, enabling users to analyze and visualize the chemical space of NMs.The Model toolkit's NanoPredictor module provides in-house ML models for predictions, and the AutoNanoML module implements LR and PLSR algorithms for ML modeling.LR is a classic algorithm for developing regression models that predict various endpoints of NMs, such as cellular uptake, viability, apoptosis, and oxidative stress 5, 6 .PLSR, combining PCA and multiple linear regression, reduces descriptor dimensions and constructs components to account for dataset variance, helping avoid multicollinearity and overfitting.It is suitable for modeling small training sets with large descriptor sets.The cross-validation procedure was implemented to find the optimal parameters for modeling.The coefficient of R 2 and RMSE were used as key metrics to evaluate the resulted model as described in our previous study 2 .

Virtual nanomaterial library construction
To construct the virtual NM library, we created an NM dataset containing basic structural information by rationalizing structural parameters.The range of these parameter values were based on the same parameters obtained from experimental details of references.The virtual nanostructures that have never been synthesized were constructed using in-house scripts (coded in Python 3.8), which took the basic structure information from the library dataset as input parameters.The annotated nanostructures in the library were saved in PDB formats.Geometrical nanodescriptors were calculated from generated nanostructures using in-house scripts (C++/Java 1.8.0_301), and these descriptors were used with pre-developed ML models to predict the properties, bioactivities, and toxicities of new NMs in the library.Abbreviations: v-Gs (virtual graphenes), v-rGOs (virtual reduced graphene oxide), v-GOs (virtual graphene oxide), exp-GRMs (experimental graphene-related materials), exp-Gs (experimental graphenes), exp-rGO-s (experimental reduced graphene oxide in small size), exp-rGO-l (experimental reduced graphene oxide in large size), exp-GO-s (experimental graphene oxide in small size), exp-GO-l (experimental graphene oxide in large size).

Figure S5
. Endpoint Profiling of virtual 2DNMs in the library.(A) Users can access the predictions of properties/bioactivities/toxicities for virtual 2DNMs through the Endpoint Profile interface, with options to download these results in batches.(B) By clicking on a specific virtual 2DNM in the interactive table, users will be directed to its detailed record page (red arrow).This page displays the 2DNM's structural information and provides downloadable access to its structure data, descriptor data, and prediction results.The number of heavy metals on the surface of virtual PS is also different.The virtual nanostructures are rendered using the VDW drawing method in VMD.

Figure S1 .
Figure S1.Schematic overview of the ViNAS-Pro platform architecture.ViNAS-Pro consists of

Figure S2 .
Figure S2.Performing prediction by the pre-developed models through the NanoPredictor

Figure S3 .
Figure S3.Developing the linear regression model for prediction through the AutoNanoML

Figure S4 .
Figure S4.Exploratory data analysis of virtual 2DNMs in the library.(A) The histogram shows

Figure S6 .
Figure S6.Visualization of representative virtual PtNPs in the library.The virtual PtNPs are

Figure S7 .
Figure S7.Visualization of representative virtual PS in the library.The virtual PS are

Figure S8 .
Figure S8.Exploratory data analysis of virtual PtNPs in the library.(A) The histogram shows

Figure S9 .
Figure S9.Exploratory data analysis of virtual PS in the library.(A) The histogram shows the

Figure S10 .
Figure S10.Overview of services provided by ViNAS-Pro.(A) The Data Deposit interface and

Figure S11 .
Figure S11.A case study on designing new nanomaterials using ViNAS-Pro.(A) Users request